HCS: Hierarchical Cut Selection for Efficiently Processing Queries on Data Columns using Hierarchical Bitmap Indices
نویسندگان
چکیده
When data are large and query processing workloads consist of data selection and aggregation operations (as in online analytical processing), column-oriented data stores are generally the preferred choice of data organization, because they enable effective data compression, leading to significantly reduced IO. Most columnstore architectures leverage bitmap indices, which themselves can be compressed, for answering queries over data columns. Columndomains (e.g., geographical data, categorical data, biological taxonomies, organizational data) are hierarchical in nature, and it may be more advantageous to create hierarchical bitmap indices, that can help answer queries over different sub-ranges of the domain. However, given a query workload, it is critical to choose the appropriate subset of bitmap indices from the given hierarchy. Thus, in this paper, we introduce the cut-selection problem, which aims to help identify a subset (cut) of the nodes of the domain hierarchy, with the appropriate bitmap indices. We discuss inclusive, exclusive, and hybrid strategies for cut-selection and show that the hybrid strategy can be efficiently computed and returns optimal (in terms of IO) results in cases where there are no memory constraints. We also show that when there is a memory availability constraint, the cut-selection problem becomes difficult and, thus, present efficient cut-selection strategies that return close to optimal results, especially in situations where the memory limitations are very strict (i.e., the data and the hierarchy are much larger than the available memory). Experiment results confirm the efficiency and effectiveness of the proposed cut-selection algorithms.
منابع مشابه
Binning Strategy for Hierarchical Bitmap Indices with Large Scale Domain Hierarchy
As bitmap indices are useful for OLAP queries over lowcardinality data columns, they are frequently used in data warehouses. In many data warehouse applications, the domain of a column tends to be hierarchical, such as categorical data and geographical data. When the domain of a column is hierarchical in nature, the performance of query processing can be improved significantly by leveraging hie...
متن کاملCompressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads
In most spatial data management applications, objects are represented in terms of their coordinates in a 2-dimensional space and search queries in this space are processed using spatial index structures. On the other hand, bitmap-based indexing, especially thanks to the compression opportunities bitmaps provide, has been shown to be highly effective for query processing workloads including sele...
متن کاملProcessing relational OLAP queries with UB-Trees and multidimensional hierarchical clustering
Multidimensional access methods like the UBTree can be used to accelerate almost any query processing operation, if proper query processing algorithms are used: Relational queries or SQL queries consist of restrictions, projections, ordering, grouping and aggregation, and join operations. In the presence of multidimensional restrictions or sorting, multidimensional range query or Tetris algorit...
متن کاملHierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes
Set-valued attributes are convenient to model complex objects occurring in the real world. Currently available database systems support the storage of set-valued attributes in relational tables but contain no primitives to query them efficiently. Queries involving set-valued attributes either perform full scans of the source data or make multiple passes over single-value indexes to reduce the n...
متن کاملA Data Mining Approach for selecting Bitmap Join Indices
Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap...
متن کامل